Brandt's GLR method & refined HMM segmentation for TTS synthesis application
نویسندگان
چکیده
In comparison with standard HMM (Hidden Markov Model) with forced alignment, this paper discusses two automatic segmentation algorithms from different points of view: the probabilities of insertion and omission, and the accuracy. The first algorithm, hereafter named the refined HMM algorithm, aims at refining the segmentation performed by standard HMM via a GMM (Gaussian Mixture Model) of each boundary. The second is the Brandt’s GLR (Generalized Likelihood Ratio) method. Its goal is to detect signal discontinuities. Provided that the sequence of speech units is known, the experimental results presented in this paper suggest in combining the refined HMM algorithm with Brandt’s GLR method and other algorithms adapted to the detection of boundaries between known acoustic classes.
منابع مشابه
A fusion approach for automatic speech segmentation of large corpora with application to speech synthesis
This paper deals with the automatic segmentation of large speech corpora in the case when the phonetic sequence corresponding to the speech signal is known. A direct and typical application is corpus-based Text-To-Speech (TTS) synthesis. We start by proposing a general approach for combining several segmentations produced by different algorithms. Then, we describe and analyse three automatic se...
متن کاملDeep Learning Techniques in Tandem with Signal Processing Cues for Phonetic Segmentation for Text to Speech Synthesis in Indian Languages
Automatic detection of phoneme boundaries is an important sub-task in building speech processing applications, especially text-to-speech synthesis (TTS) systems. The main drawback of the Gaussian mixture model hidden Markov model (GMMHMM) based forced-alignment is that the phoneme boundaries are not explicitly modeled. In an earlier work, we had proposed the use of signal processing cues in tan...
متن کاملAn HMM trajectory tiling (HTT) approach to high quality TTS
We propose an HMM Trajectory Tiling (HTT) approach to high quality TTS, which is our entry to Blizzard Challenge 2010. In HTT, first refined HMM is trained with the Minimum Generation Error (MGE) criterion; then trajectory generated by the refined HMM is to guide the search for finding the closest waveform segment “tiles” in synthesis. Normalized distances between HMM trajectory and those of th...
متن کاملDecision Tree Classification Approach for Model Selection in Segmenting Mandarin TTS Corpus
High accuracy automatic segmentation of Mandarin TTS (text to speech) corpus is vital for obtaining high quality syllable’s boundary to corpusbased speech synthesis. Among the existing methods, most studies on automatic segmentation are based upon single model, ignoring the diverse time marks gained by different models in specific Mandarin boundary environment. In this paper, three hidden Marko...
متن کاملAn Improved Automatic EEG Signal Segmentation Method based on Generalized Likelihood Ratio
It is often needed to label electroencephalogram (EEG) signals by segments of similar characteristics that are particularly meaningful to clinicians and for assessment by neurophysiologists. Within each segment, the signals are considered statistically stationary, usually with similar characteristics such as amplitude and/or frequency. In order to detect the segments boundaries of a signal, we ...
متن کامل